Techniques for Language Identification for Hybrid Arabic-English Document Images
نویسندگان
چکیده
Because of the different characteristics of Arabic language and Romance and Anglo Saxon languages, recognition of documents written in hybrid of these languages requires that the language of the text to be identified priori to the recognition phase. In this paper, three efficient techniques that can be used to discriminate between text written in Arabic script and text written in English script are presented and evaluated. These techniques addresses the language identification problem on the word level and on textline level. The characteristics of horizontal projection profiles as well as runlength histograms for text written in both languages are the basic features underlying these techniques. Solving this problem is very important in building bilingual document image analysis systems which are capable of processing documents containing hybrid Arabic/Romance and Anglo Saxon languages.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملLanguage identification for handwritten document images using a shape codebook
Language identification for handwritten document images is an open document analysis problem. In this paper, we propose a novel approach to language identification for documents containing mixture of handwritten and machine printed text using image descriptors constructed from a codebook of shape features. We encode local text structures using scale and rotation invariant codewords, each repres...
متن کاملFormulation of Language Teachers̕ Identity in the Situated Learning of Language Teaching Community of Practice
A community of practice may shape and reshape the identity of members of the community through providing them with situated learning or learning environment. This study, therefore, is to clarify the salient learning-based features of the language teaching community of practice that might formulate the identity of language teachers. To this end, the study examined how learning situations in two ...
متن کاملAn Analysis of Ministry of Education’s Strategic Plans Based on Favorable Components of English Language Teaching Using Shannon’s Entropy
The present research aims to analyze the content of Ministry of Education’s strategic plans (the Fundamental Reform Document of Education, the Comprehensive National Scientific Plan and the National Curriculum Document) based on Shannon's entropy regarding the favorable components of teaching English. The contents of the Fundamental Reform Document of Education, the Comprehensive National Scien...
متن کاملIntegrating morpho-syntactic features in English-Arabic statistical machine translation
This paper presents a hybrid approach to the enhancement of English to Arabic statistical machine translation quality. Machine Translation has been defined as the process that utilizes computer software to translate text from one natural language to another. Arabic, as a morphologically rich language, is a highly flexional language, in that the same root can lead to various forms according to i...
متن کامل